home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
SGI Developer Toolbox 6.1
/
SGI Developer Toolbox 6.1 - Disc 4.iso
/
public
/
GNU
/
emacs.inst
/
emacs19.idb
/
usr
/
gnu
/
info
/
elisp-24.z
/
elisp-24
Encoding:
Amiga
Atari
Commodore
DOS
FM Towns/JPY
Macintosh
Macintosh JP
Macintosh to JP
NeXTSTEP
RISC OS/Acorn
Shift JIS
UTF-8
Wrap
GNU Info File
|
1994-08-02
|
47.1 KB
|
1,095 lines
This is Info file elisp, produced by Makeinfo-1.55 from the input file
elisp.texi.
This version is newer than the second printed edition of the GNU
Emacs Lisp Reference Manual. It corresponds to Emacs Version 19.19.
Published by the Free Software Foundation 675 Massachusetts Avenue
Cambridge, MA 02139 USA
Copyright (C) 1990, 1991, 1992, 1993 Free Software Foundation, Inc.
Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.
Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that
the entire resulting derived work is distributed under the terms of a
permission notice identical to this one.
Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a
translation approved by the Foundation.
Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided also
that the section entitled "GNU Emacs General Public License" is included
exactly as in the original, and provided that the entire resulting
derived work is distributed under the terms of a permission notice
identical to this one.
Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that the section entitled "GNU Emacs General Public
License" may be included in a translation approved by the Free Software
Foundation instead of in the original English.
File: elisp, Node: Examining Properties, Next: Changing Properties, Up: Text Properties
Examining Text Properties
-------------------------
The simplest way to examine text properties is to ask for the value
of a particular property of a particular character. For that, use
`get-text-property'. Use `text-properties-at' to get the entire
property list of a character. *Note Property Search::, for functions
to examine the properties of a number of characters at once.
These functions handle both strings and buffers. Keep in mind that
positions in a string start from 0, whereas positions in a buffer start
from 1.
- Function: get-text-property POS PROP &optional OBJECT
This function returns the value of the PROP property of the
character after position POS in OBJECT (a buffer or string). The
argument OBJECT is optional and defaults to the current buffer.
If there is no PROP property strictly speaking, but the character
has a category which is a symbol, then `get-text-property' returns
the PROP property of that symbol.
- Function: text-properties-at POSITION &optional OBJECT
This function returns the list of properties held by the character
at POSITION in the string or buffer OBJECT. If OBJECT is `nil',
it defaults to the current buffer.
- Function: text-property-any START END PROP VALUE &optional OBJECT
This function returns non-`nil' if at least one character between
START and END has a property PROP whose value is VALUE. More
precisely, it returns the position of the first such character.
Otherwise, it returns `nil'.
The optional fifth argument, OBJECT, specifies the string or
buffer to scan. Positions are relative to OBJECT.
- Function: text-property-not-all START END PROP VALUE &optional OBJECT
This function returns non-`nil' if at least one character between
START and END has a property PROP whose value differs from VALUE.
More precisely, it returns the position of the first such
character. Otherwise, it returns `nil'.
The optional fifth argument, OBJECT, specifies the string or
buffer to scan. Positions are relative to OBJECT.
File: elisp, Node: Changing Properties, Next: Property Search, Prev: Examining Properties, Up: Text Properties
Changing Text Properties
------------------------
The primitives for changing properties apply to a specified range of
text. The function `set-text-properties' (see end of section) sets the
entire property list of the text in that range; more often, it is
useful to add, change, or delete just certain properties specified by
name.
Since text properties are considered part of the buffer's contents,
and can affect how the buffer looks on the screen, any change in the
text properties is considered a buffer modification. Buffer text
property changes are undoable.
- Function: add-text-properties START END PROPS &optional OBJECT
This function modifies the text properties for the text between
START and END in the string or buffer OBJECT. If OBJECT is `nil',
it defaults to the current buffer.
The argument PROPS specifies which properties to change. It
should have the form of a property list (*note Property Lists::.):
a list whose elements include the property names followed
alternately by the corresponding values.
The return value is `t' if the function actually changed some
property's value; `nil' otherwise (if PROPS is `nil' or its values
agree with those in the text).
For example, here is how to set the `comment' property to `t' for
a range of text:
(add-text-properties (region-beginning)
(region-end)
(list 'comment t))
- Function: put-text-property START END PROP VALUE &optional OBJECT
This function sets the PROP property to VALUE for the text between
START and END in the string or buffer OBJECT. If OBJECT is `nil',
it defaults to the current buffer.
- Function: remove-text-properties START END PROPS &optional OBJECT
This function deletes specified text properties from the text
between START and END in the string or buffer OBJECT. If OBJECT
is `nil', it defaults to the current buffer.
The argument PROPS specifies which properties to delete. It
should have the form of a property list (*note Property Lists::.):
a list whose elements include the property names followed by the
corresponding values. The property names mentioned in PROPS are
the ones deleted from the text. The values associated in PROPS
with these names do not matter.
The return value is `t' if the function actually changed some
property's value; `nil' otherwise (if PROPS is `nil' or if none of
the text had any of those properties).
- Function: set-text-properties START END PROPS &optional OBJECT
This function completely replaces the text property list for the
text between START and END in the string or buffer OBJECT. If
OBJECT is `nil', it defaults to the current buffer.
The argument PROPS is the new property list. It should have the
form of a list whose elements include the property names followed
by the corresponding values.
After `set-text-properties' returns, all the characters in the
specified range have identical properties.
If PROPS is `nil', the effect is to get rid of all properties from
the specified range of text. Here's an example:
(set-text-properties (region-beginning)
(region-end)
nil)
File: elisp, Node: Property Search, Next: Special Properties, Prev: Changing Properties, Up: Text Properties
Property Search Functions
-------------------------
In typical use of text properties, most of the time several or many
consecutive characters have the same value for a property. Rather than
writing your programs to examine characters one by one, it is much
faster to process chunks of text that have the same property value.
Here are functions you can use to do this. In all cases, OBJECT
defaults to the current buffer.
- Function: next-property-change POS &optional OBJECT
The function scans the text forward from position POS in the
string or buffer OBJECT till it finds a change in some text
property, then returns the position of the change. In other
words, it returns the position of the first character beyond POS
whose properties are not identical to those of the character just
after POS.
The value is `nil' if the properties remain unchanged all the way
to the end of OBJECT. If the value is non-`nil', it is a position
greater than POS, never equal.
Here is an example of how to scan the buffer by chunks of text
within which all properties are constant:
(while (not (eobp))
(let ((plist (text-properties-at (point)))
(next-change
(or (next-property-change (point) (current-buffer))
(point-max))))
PROCESS TEXT FROM POINT TO NEXT-CHANGE...
(goto-char next-change)))
- Function: next-single-property-change POS PROP &optional OBJECT
The function scans the text forward from position POS in the
string or buffer OBJECT till it finds a change in the PROP
property, then returns the position of the change. In other
words, it returns the position of the first character beyond POS
whose PROP property differs from that of the character just after
POS.
The value is `nil' if the properties remain unchanged all the way
to the end of OBJECT. If the value is non-`nil', it is a position
greater than POS, never equal.
- Function: previous-property-change POS &optional OBJECT
This is like `next-property-change', but scans back from POS
instead of forward. If the value is non-`nil', it is a position
always strictly less than POS.
- Function: previous-single-property-change POS PROP &optional OBJECT
This is like `next-property-change', but scans back from POS
instead of forward. If the value is non-`nil', it is a position
always strictly less than POS.
File: elisp, Node: Special Properties, Next: Sticky Properties, Prev: Property Search, Up: Text Properties
Special Properties
------------------
If a character has a `category' property, we call it the "category"
of the character. It should be a symbol. The properties of the symbol
serve as defaults for the properties of the character.
You can use the property `face' to control the font and color of
text. *Note Faces::, for more information. This feature is temporary;
in the future, we may replace it with other ways of specifying how to
display text.
The property `mouse-face' is used instead of `face' when the mouse
is on or near the character. For this purpose, "near" means that all
text between the character and where the mouse is have the same
`mouse-face' property value.
You can specify a different keymap for a portion of the text by means
of a `local-map' property. The property's value, for the character
after point, replaces the buffer's local map. *Note Active Keymaps::.
If a character has the property `read-only', then modifying that
character is not allowed. Any command that would do so gets an error.
Insertion next to a read-only character is also an error if inserting
ordinary text there would inherit the `read-only' property due to
stickiness. Thus, you can control permission to insert next to
read-only text by controlling the stickiness. *Note Sticky
Properties::.
Since changing properties counts as modifying the buffer, it is not
possible to remove a `read-only' property unless you know the special
trick: bind `inhibit-read-only' to a non-`nil' value and then remove
the property. *Note Read Only Buffers::.
A non-`nil' `invisible' property means a character does not appear
on the screen. This works much like selective display. Details of
this feature are likely to change in future versions, so check the
`etc/NEWS' file in the version you are using.
If a character has the property `modification-hooks', then its value
should be a list of functions; modifying that character calls all of
those functions. Each function receives two arguments: the beginning
and end of the part of the buffer being modified. Note that if a
particular modification hook function appears on several characters
being modified by a single primitive, you can't predict how many times
the function will be called.
Insertion of text does not, strictly speaking, change any existing
character, so there is a special rule for insertion. It compares the
`read-only' properties of the two surrounding characters; if they are
non-`nil' and `eq' to each other, then the insertion is not allowed.
Assuming insertion is allowed, it then calls the functions listed in
the `insert-in-front-hooks' property of the following character and in
the `insert-behind-hooks' property of the preceding character. These
functions receive two arguments, the beginning and end of the inserted
text.
See also *Note Change Hooks::, for other hooks that are called when
you change text in a buffer.
The special properties `point-entered' and `point-left' record hook
functions that report motion of point. Each time point moves, Emacs
compares these two property values:
* the `point-left' property of the character after the old location,
and
* the `point-entered' property of the character after the new
location.
If these two values differ, each of them is called (if not `nil') with
two arguments: the old value of point, and the new one.
The same comparison is made for the characters before the old and new
locations. The result may be to execute two `point-left' functions
(which may be the same function) and/or two `point-entered' functions
(which may be the same function). The `point-left' functions are
always called before the `point-entered' functions.
A primitive function may examine characters at various positions
without moving point to those positions. Only an actual change in the
value of point runs these hook functions.
- Variable: inhibit-point-motion-hooks
When this variable is non-`nil', `point-left' and `point-entered'
hooks are not run.
File: elisp, Node: Sticky Properties, Next: Not Intervals, Prev: Special Properties, Up: Text Properties
Stickiness of Text Properties
-----------------------------
Inserting a string with no text properties into the buffer normally
gives the inserted text the same properties as the preceding character.
You can control this copying of properties by setting the
`front-sticky' and `rear-nonsticky' properties of a character.
If you make a character's `front-sticky' property `t', then
insertion before the character receives its properties. If you make the
`rear-nonsticky' property `t', then insertion after that character does
*not* receive its properties. You can regard characters as being
normally "rear-sticky" by default, but not "front-sticky"; thus, by
default, insertion normally receives properties from the previous
character only.
If neither side of an insertion is suitably sticky, then the inserted
text gets no properties. If both sides are sticky, then the inserted
text gets the properties of both sides, with the previous character's
properties taking precedence when both sides have a property in common.
You can also specify stickiness for individual properties. To do so,
use a list of property names as the value of the `front-sticky'
property or the `rear-nonsticky' property. For example, if a character
has a `rear-nonsticky' property whose value is `(face read-only)', then
insertion after the character does not receive its `face' property its
or `read-only' property (if any), but does receive any other properties
it has.
The merging of properties when both sides of the insertion are sticky
takes place one property at a time. If the preceding character is
`rear-sticky' for the property, and the property is non-`nil', it
dominates. Otherwise, the following character's property value is used
if it is `front-sticky' for that property.
File: elisp, Node: Not Intervals, Prev: Sticky Properties, Up: Text Properties
Why Text Properties are not Intervals
-------------------------------------
Some editors that support adding attributes to text in the buffer do
so by letting the user specify "intervals" within the text, and adding
the properties to the intervals. Those editors permit the user or the
programmer to determine where individual intervals start and end. We
deliberately provided a different sort of interface in Emacs Lisp to
avoid certain paradoxical behavior associated with text modification.
If the actual subdivision into intervals is meaningful, that means
you can distinguish between a buffer that is just one interval with a
certain property, and a buffer containing the same text subdivided into
two intervals, both of which have that property.
Suppose you take the buffer with just one interval and kill part of
the text. The text remaining in the buffer is one interval, and the
copy in the kill ring (and the undo list) becomes a separate interval.
Then if you undo the kill, you get two intervals with the same
properties. Thus, the distinction can't be preserved when editing
happens.
But suppose we "fix" this problem by coalescing the two intervals
when the text is inserted. That works fine if the buffer originally was
a single interval. But if it was two intervals, and the killed text
equals one of them, then undoing the kill yields just one interval.
Again, the distinction can't be preserved.
Insertion of text at the border between intervals also raises
questions that have no satisfactory answer.
However, it is easy to arrange for editing to behave consistently for
questions of the form, "What are the properties of this character?" So
we have decided these are the only questions that make sense; we have
not implemented asking questions about where intervals start or end.
For practical purposes, the property search functions serve in place
of explicit interval boundaries. You can think of them as finding the
boundaries of intervals, assuming that intervals are always coalesced
whenever possible. *Note Property Search::.
Emacs also provides explicit intervals as a presentation feature; see
*Note Overlays::.
File: elisp, Node: Substitution, Next: Underlining, Prev: Text Properties, Up: Text
Substituting for a Character Code
=================================
The following functions replace characters within a specified region
based on their character codes.
- Function: subst-char-in-region START END OLD-CHAR NEW-CHAR &optional
NOUNDO
This function replaces all occurrences of the character OLD-CHAR
with the character NEW-CHAR in the region of the current buffer
defined by START and END.
If NOUNDO is non-`nil', then `subst-char-in-region' does not
record the change for undo and does not mark the buffer as
modified. This feature is useful for changes which are not
considered significant, such as when Outline mode changes visible
lines to invisible lines and vice versa.
`subst-char-in-region' does not move point and returns `nil'.
---------- Buffer: foo ----------
This is the contents of the buffer before.
---------- Buffer: foo ----------
(subst-char-in-region 1 20 ?i ?X)
=> nil
---------- Buffer: foo ----------
ThXs Xs the contents of the buffer before.
---------- Buffer: foo ----------
- Function: translate-region START END TABLE
This function applies a translation table to the characters in the
buffer between positions START and END.
The translation table TABLE is a string; `(aref TABLE OCHAR)'
gives the translated character corresponding to OCHAR. If the
length of TABLE is less than 256, any characters with codes larger
than the length of TABLE are not altered by the translation.
The return value of `translate-region' is the number of characters
which were actually changed by the translation. This does not
count characters which were mapped into themselves in the
translation table.
This function is available in Emacs versions 19 and later.
File: elisp, Node: Underlining, Next: Registers, Prev: Substitution, Up: Text
Underlining
===========
The underlining commands are somewhat obsolete. The
`underline-region' function actually inserts `_^H' before each
appropriate character in the region. This command provides a minimal
text formatting feature that might work on your printer; however, we
recommend instead that you use more powerful text formatting facilities,
such as Texinfo.
- Command: underline-region START END
This function underlines all nonblank characters in the region
defined by START and END. That is, an underscore character and a
backspace character are inserted just before each non-whitespace
character in the region. The backspace characters are intended to
cause overstriking, but in Emacs they display as either `\010' or
`^H', depending on the setting of `ctl-arrow'. There is no way to
see the effect of the overstriking within Emacs. The value is
`nil'.
- Command: ununderline-region START END
This function removes all underlining (overstruck underscores) in
the region defined by START and END. The value is `nil'.
File: elisp, Node: Registers, Next: Change Hooks, Prev: Underlining, Up: Text
Registers
=========
A register is a sort of variable used in Emacs editing that can hold
a marker, a string, a rectangle, a window configuration (of one frame),
or a frame configuration (of all frames). Each register is named by a
single character. All characters, including control and meta characters
(but with the exception of `C-g'), can be used to name registers.
Thus, there are 255 possible registers. A register is designated in
Emacs Lisp by a character which is its name.
The functions in this section return unpredictable values unless
otherwise stated.
- Variable: register-alist
This variable is an alist of elements of the form `(NAME .
cONTENTS)'. Normally, there is one element for each Emacs
register that has been used.
The object NAME is a character (an integer) identifying the
register. The object CONTENTS is a string, marker, or list
representing the register contents. A string represents text
stored in the register. A marker represents a position. A list
represents a rectangle; its elements are strings, one per line of
the rectangle.
- Command: view-register REG
This command displays what is contained in register REG.
- Function: get-register REG
This function returns the contents of the register REG, or `nil'
if it has no contents.
- Function: set-register REG VALUE
This function sets the contents of register REG to VALUE. A
register can be set to any value, but the other register functions
expect only certain data types. The return value is VALUE.
- Command: point-to-register REG
This command stores both the current location of point and the
current buffer in register REG as a marker.
- Command: jump-to-register REG
- Command: register-to-point REG
This command restores the status recorded in register REG.
If REG contains a marker, it moves point to the position stored in
the marker. Since both the buffer and the location within the
buffer are stored by the `point-to-register' function, this
command can switch you to another buffer.
If REG contains a window configuration or a frame configuration.
`jump-to-register' restores that configuration.
- Command: insert-register REG &optional BEFOREP
This command inserts contents of register REG into the current
buffer.
Normally, this command puts point before the inserted text, and the
mark after it. However, if the optional second argument BEFOREP
is non-`nil', it puts the mark before and point after. You can
pass a non-`nil' second argument BEFOREP to this function
interactively by supplying any prefix argument.
If the register contains a rectangle, then the rectangle is
inserted with its upper left corner at point. This means that
text is inserted in the current line and underneath it on
successive lines.
If the register contains something other than saved text (a
string) or a rectangle (a list), currently useless things happen.
This may be changed in the future.
- Command: copy-to-register REG START END &optional DELETE-FLAG
This command copies the region from START to END into register
REG. If DELETE-FLAG is non-`nil', it deletes the region from the
buffer after copying it into the register.
- Command: prepend-to-register REG START END &optional DELETE-FLAG
This command prepends the region from START to END into register
REG. If DELETE-FLAG is non-`nil', it deletes the region from the
buffer after copying it to the register.
- Command: append-to-register REG START END &optional DELETE-FLAG
This command appends the region from START to END to the text
already in register REG. If DELETE-FLAG is non-`nil', it deletes
the region from the buffer after copying it to the register.
- Command: copy-rectangle-to-register REG START END &optional
DELETE-FLAG
This command copies a rectangular region from START to END into
register REG. If DELETE-FLAG is non-`nil', it deletes the region
from the buffer after copying it to the register.
- Command: window-configuration-to-register REG
This function stores the window configuration of the selected
frame in register REG.
- Command: frame-configuration-to-register REG
This function stores the current frame configuration in register
REG.
File: elisp, Node: Change Hooks, Prev: Registers, Up: Text
Change Hooks
============
These hook variables let you arrange to take notice of all changes in
all buffers (or in a particular buffer, if you make them buffer-local).
See also *Note Special Properties::, for how to detect changes to
specific parts of the text.
The functions you use in these hooks should save and restore the
match data if they do anything that uses regular expressions;
otherwise, they will interfere in bizarre ways with the editing
operations that call them.
- Variable: before-change-function
If this variable is non-`nil', then it should be a function; the
function is called before any buffer modification. Its arguments
are the beginning and end of the region that is going to change,
represented as integers. The buffer that's about to change is
always the current buffer.
- Variable: after-change-function
If this variable is non-`nil', then it should be a function; the
function is called after any buffer modification. It receives
three arguments: the beginning and end of the region just changed,
and the length of the text that existed before the change. (To
get the current length, subtract the region beginning from the
region end.) All three arguments are integers. The buffer that's
about to change is always the current buffer.
Both of these variables are temporarily bound to `nil' during the
time that either of these hooks is running. This means that if one of
these functions changes the buffer, that change won't run these
functions. If you do want the hook function to be run recursively,
write your hook functions to bind these variables back to their usual
values.
- Variable: first-change-hook
This variable is a normal hook; its hook functions are run using
`run-hooks' whenever a buffer is changed that was previously in
the unmodified state.
The variables described in this section are meaningful only starting
with Emacs version 19.
File: elisp, Node: Searching and Matching, Next: Syntax Tables, Prev: Text, Up: Top
Searching and Matching
**********************
GNU Emacs provides two ways to search through a buffer for specified
text: exact string searches and regular expression searches. After a
regular expression search, you can identify the text matched by parts of
the regular expression by examining the "match data".
* Menu:
* String Search:: Search for an exact match.
* Regular Expressions:: Describing classes of strings.
* Regexp Search:: Searching for a match for a regexp.
* Replacement:: Internals of `query-replace'.
* Match Data:: Finding out which part of the text matched
various parts of a regexp, after regexp search.
* Standard Regexps:: Useful regexps for finding sentences, pages,...
* Searching and Case:: Case-independent or case-significant searching.
File: elisp, Node: String Search, Next: Regular Expressions, Up: Searching and Matching
Searching for Strings
=====================
These are the primitive functions for searching through the text in a
buffer. They are meant for use in programs, but you may call them
interactively. If you do so, they prompt for the search string; LIMIT
and NOERROR are set to `nil', and REPEAT is set to 1.
- Command: search-forward STRING &optional LIMIT NOERROR REPEAT
This function searches forward from point for an exact match for
STRING. If successful, it sets point to the end of the occurrence
found, and returns the new value of point. If no match is found,
the value and side effects depend on NOERROR (see below).
In the following example, point is positioned at the beginning of
the line. Then `(search-forward "fox")' is evaluated in the
minibuffer and point is left after the last letter of `fox':
---------- Buffer: foo ----------
-!-The quick brown fox jumped over the lazy dog.
---------- Buffer: foo ----------
(search-forward "fox")
=> t
---------- Buffer: foo ----------
The quick brown fox-!- jumped over the lazy dog.
---------- Buffer: foo ----------
The argument LIMIT specifies the upper bound to the search. (It
must be a position in the current buffer.) No match extending
after that position is accepted. If LIMIT is omitted or `nil', it
defaults to the end of the accessible portion of the buffer.
What happens when the search fails depends on the value of
NOERROR. If NOERROR is `nil', a `search-failed' error is
signaled. If NOERROR is `t', `search-forward' returns `nil' and
does nothing. If NOERROR is neither `nil' nor `t', then
`search-forward' moves point to the upper bound and returns `nil'.
(It would be more consistent now to return the new position of
point in that case, but some programs may depend on a value of
`nil'.)
If REPEAT is non-`nil', then the search is repeated that many
times. Point is positioned at the end of the last match.
- Command: search-backward STRING &optional LIMIT NOERROR REPEAT
This function searches backward from point for STRING. It is just
like `search-forward' except that it searches backwards and leaves
point at the beginning of the match.
- Command: word-search-forward STRING &optional LIMIT NOERROR REPEAT
This function searches forward from point for a "word" match for
STRING. If it finds a match, it sets point to the end of the
match found, and returns the new value of point.
A word search differs from a simple string search in that a word
search *requires* that the words it searches for are present as
entire words (searching for the word `ball' does not match the word
`balls'), and punctuation and spacing are ignored (searching for
`ball boy' does match `ball. Boy!').
In this example, point is first placed at the beginning of the
buffer; the search leaves it between the `y' and the `!'.
---------- Buffer: foo ----------
-!-He said "Please! Find
the ball boy!"
---------- Buffer: foo ----------
(word-search-forward "Please find the ball, boy.")
=> t
---------- Buffer: foo ----------
He said "Please! Find
the ball boy-!-!"
---------- Buffer: foo ----------
If LIMIT is non-`nil' (it must be a position in the current
buffer), then it is the upper bound to the search. The match
found must not extend after that position.
If NOERROR is `t', then `word-search-forward' returns `nil' when a
search fails, instead of signaling an error. If NOERROR is
neither `nil' nor `t', then `word-search-forward' moves point to
LIMIT (or the end of the buffer) and returns `nil'.
If REPEAT is non-`nil', then the search is repeated that many
times. Point is positioned at the end of the last match.
- Command: word-search-backward STRING &optional LIMIT NOERROR REPEAT
This function searches backward from point for a word match to
STRING. This function is just like `word-search-forward' except
that it searches backward and normally leaves point at the
beginning of the match.
File: elisp, Node: Regular Expressions, Next: Regexp Search, Prev: String Search, Up: Searching and Matching
Regular Expressions
===================
A "regular expression" ("regexp", for short) is a pattern that
denotes a (possibly infinite) set of strings. Searching for matches for
a regexp is a very powerful operation. This section explains how to
write regexps; the following section says how to search for them.
* Menu:
* Syntax of Regexps:: Rules for writing regular expressions.
* Regexp Example:: Illustrates regular expression syntax.
File: elisp, Node: Syntax of Regexps, Next: Regexp Example, Up: Regular Expressions
Syntax of Regular Expressions
-----------------------------
Regular expressions have a syntax in which a few characters are
special constructs and the rest are "ordinary". An ordinary character
is a simple regular expression which matches that character and nothing
else. The special characters are `$', `^', `.', `*', `+', `?', `[',
`]' and `\'; no new special characters will be defined in the future.
Any other character appearing in a regular expression is ordinary,
unless a `\' precedes it.
For example, `f' is not a special character, so it is ordinary, and
therefore `f' is a regular expression that matches the string `f' and
no other string. (It does *not* match the string `ff'.) Likewise, `o'
is a regular expression that matches only `o'.
Any two regular expressions A and B can be concatenated. The result
is a regular expression which matches a string if A matches some amount
of the beginning of that string and B matches the rest of the string.
As a simple example, we can concatenate the regular expressions `f'
and `o' to get the regular expression `fo', which matches only the
string `fo'. Still trivial. To do something more powerful, you need
to use one of the special characters. Here is a list of them:
`. (Period)'
is a special character that matches any single character except a
newline. Using concatenation, we can make regular expressions
like `a.b' which matches any three-character string which begins
with `a' and ends with `b'.
`*'
is not a construct by itself; it is a suffix that means the
preceding regular expression is to be repeated as many times as
possible. In `fo*', the `*' applies to the `o', so `fo*' matches
one `f' followed by any number of `o's. The case of zero `o's is
allowed: `fo*' does match `f'.
`*' always applies to the *smallest* possible preceding
expression. Thus, `fo*' has a repeating `o', not a repeating `fo'.
The matcher processes a `*' construct by matching, immediately, as
many repetitions as can be found. Then it continues with the rest
of the pattern. If that fails, backtracking occurs, discarding
some of the matches of the `*'-modified construct in case that
makes it possible to match the rest of the pattern. For example,
matching `ca*ar' against the string `caaar', the `a*' first tries
to match all three `a's; but the rest of the pattern is `ar' and
there is only `r' left to match, so this try fails. The next
alternative is for `a*' to match only two `a's. With this choice,
the rest of the regexp matches successfully.
`+'
is a suffix character similar to `*' except that it must match the
preceding expression at least once. So, for example, `ca+r' will
match the strings `car' and `caaaar' but not the string `cr',
whereas `ca*r' would match all three strings.
`?'
is a suffix character similar to `*' except that it can match the
preceding expression either once or not at all. For example,
`ca?r' will match `car' or `cr'; nothing else.
`[ ... ]'
`[' begins a "character set", which is terminated by a `]'. In
the simplest case, the characters between the two form the set.
Thus, `[ad]' matches either one `a' or one `d', and `[ad]*'
matches any string composed of just `a's and `d's (including the
empty string), from which it follows that `c[ad]*r' matches `cr',
`car', `cdr', `caddaar', etc.
Character ranges can also be included in a character set, by
writing two characters with a `-' between them. Thus, `[a-z]'
matches any lower case letter. Ranges may be intermixed freely
with individual characters, as in `[a-z$%.]', which matches any
lower case letter or `$', `%' or a period.
Note that the usual special characters are not special any more
inside a character set. A completely different set of special
characters exists inside character sets: `]', `-' and `^'.
To include a `]' in a character set, make it the first character.
For example, `[]a]' matches `]' or `a'. To include a `-', write
`-' as the first or last character in the range.
To include `^', make it other than the first character in the set.
`[^ ... ]'
`[^' begins a "complement character set", which matches any
character except the ones specified. Thus, `[^a-z0-9A-Z]' matches
all characters *except* letters and digits.
`^' is not special in a character set unless it is the first
character. The character following the `^' is treated as if it
were first (thus, `-' and `]' are not special there).
Note that a complement character set can match a newline, unless
newline is mentioned as one of the characters not to match.
`^'
is a special character that matches the empty string, but only at
the beginning of a line in the text being matched. Otherwise it
fails to match anything. Thus, `^foo' matches a `foo' which occurs
at the beginning of a line.
When matching a string, `^' matches at the beginning of the string
or after a newline character `\n'.
`$'
is similar to `^' but matches only at the end of a line. Thus,
`x+$' matches a string of one `x' or more at the end of a line.
When matching a string, `$' matches at the end of the string or
before a newline character `\n'.
`\'
has two functions: it quotes the special characters (including
`\'), and it introduces additional special constructs.
Because `\' quotes special characters, `\$' is a regular
expression which matches only `$', and `\[' is a regular
expression which matches only `[', and so on.
Note that `\' also has special meaning in the read syntax of Lisp
strings (*note String Type::.), and must be quoted with `\'. For
example, the regular expression that matches the `\' character is
`\\'. To write a Lisp string that contains the characters `\\',
Lisp syntax requires you to quote each `\' with another `\'.
Therefore, the read syntax for a regular expression matching `\'
is `"\\\\"'.
*Please note:* for historical compatibility, special characters are
treated as ordinary ones if they are in contexts where their special
meanings make no sense. For example, `*foo' treats `*' as ordinary
since there is no preceding expression on which the `*' can act. It is
poor practice to depend on this behavior; better to quote the special
character anyway, regardless of where it appears.
For the most part, `\' followed by any character matches only that
character. However, there are several exceptions: characters which,
when preceded by `\', are special constructs. Such characters are
always ordinary when encountered on their own. Here is a table of `\'
constructs:
`\|'
specifies an alternative. Two regular expressions A and B with
`\|' in between form an expression that matches anything that
either A or B matches.
Thus, `foo\|bar' matches either `foo' or `bar' but no other string.
`\|' applies to the largest possible surrounding expressions.
Only a surrounding `\( ... \)' grouping can limit the grouping
power of `\|'.
Full backtracking capability exists to handle multiple uses of
`\|'.
`\( ... \)'
is a grouping construct that serves three purposes:
1. To enclose a set of `\|' alternatives for other operations.
Thus, `\(foo\|bar\)x' matches either `foox' or `barx'.
2. To enclose a complicated expression for a suffix character
such as `*' to operate on. Thus, `ba\(na\)*' matches
`bananana', etc., with any (zero or more) number of `na'
strings.
3. To record a matched substring for future reference.
This last application is not a consequence of the idea of a
parenthetical grouping; it is a separate feature which happens to
be assigned as a second meaning to the same `\( ... \)' construct
because there is no conflict in practice between the two meanings.
Here is an explanation of this feature:
`\DIGIT'
matches the same text which is matched the DIGITth time by a
previous `\( ... \)' construct.
In other words, after the end of a `\( ... \)' construct. the
matcher remembers the beginning and end of the text matched by
that construct. Then, later on in the regular expression, you can
use `\' followed by DIGIT to mean "match the same text matched the
DIGITth time by the `\( ... \)' construct."
The strings matching the first nine `\( ... \)' constructs
appearing in a regular expression are assigned numbers 1 through 9
in the order that the open parentheses appear in the regular
expression. So you can use `\1' through `\9' to refer to the text
matched by the corresponding `\( ... \)' constructs.
For example, `\(.*\)\1' matches any newline-free string that is
composed of two identical halves. The `\(.*\)' matches the first
half, which may be anything, but the `\1' that follows must match
the same exact text.
`\`'
matches the empty string, provided it is at the beginning of the
buffer or string being matched against.
`\''
matches the empty string, provided it is at the end of the buffer
or string being matched against.
`\='
matches the empty string, provided it is at point. (This
construct is not defined when matching against a string.)
`\b'
matches the empty string, provided it is at the beginning or end
of a word. Thus, `\bfoo\b' matches any occurrence of `foo' as a
separate word. `\bballs?\b' matches `ball' or `balls' as a
separate word.
`\B'
matches the empty string, provided it is *not* at the beginning or
end of a word.
`\<'
matches the empty string, provided it is at the beginning of a
word.
`\>'
matches the empty string, provided it is at the end of a word.
`\w'
matches any word-constituent character. The editor syntax table
determines which characters these are. *Note Syntax Tables::.
`\W'
matches any character that is not a word-constituent.
`\sCODE'
matches any character whose syntax is CODE. Here CODE is a
character which represents a syntax code: thus, `w' for word
constituent, `-' for whitespace, `(' for open parenthesis, etc.
*Note Syntax Tables::, for a list of the codes.
`\SCODE'
matches any character whose syntax is not CODE.
Not every string is a valid regular expression. For example, any
string with unbalanced square brackets is invalid, and so is a string
that ends with a single `\'. If an invalid regular expression is
passed to any of the search functions, an `invalid-regexp' error is
signaled.
- Function: regexp-quote STRING
This function returns a regular expression string which matches
exactly STRING and nothing else. This allows you to request an
exact string match when calling a function that wants a regular
expression.
(regexp-quote "^The cat$")
=> "\\^The cat\\$"
One use of `regexp-quote' is to combine an exact string match with
context described as a regular expression. For example, this
searches for the string which is the value of `string', surrounded
by whitespace:
(re-search-forward
(concat "\\s " (regexp-quote string) "\\s "))
File: elisp, Node: Regexp Example, Prev: Syntax of Regexps, Up: Regular Expressions
Complex Regexp Example
----------------------
Here is a complicated regexp, used by Emacs to recognize the end of a
sentence together with any whitespace that follows. It is the value of
the variable `sentence-end'.
First, we show the regexp as a string in Lisp syntax to enable you to
distinguish the spaces from the tab characters. The string constant
begins and ends with a double-quote. `\"' stands for a double-quote as
part of the string, `\\' for a backslash as part of the string, `\t'
for a tab and `\n' for a newline.
"[.?!][]\"')}]*\\($\\|\t\\| \\)[ \t\n]*"
In contrast, if you evaluate the variable `sentence-end', you will
see the following:
sentence-end
=>
"[.?!][]\"')}]*\\($\\| \\| \\)[
]*"
In this case, the tab and carriage return are the actual characters.
This regular expression contains four parts in succession and can be
deciphered as follows:
`[.?!]'
The first part of the pattern consists of three characters, a
period, a question mark and an exclamation mark, within square
brackets. The match must begin with one of these three characters.
`[]\"')}]*'
The second part of the pattern matches any closing braces and
quotation marks, zero or more of them, that may follow the period,
question mark or exclamation mark. The `\"' is Lisp syntax for a
double-quote in a string. The `*' at the end indicates that the
immediately preceding regular expression (a character set, in this
case) may be repeated zero or more times.
`\\($\\|\t\\| \\)'
The third part of the pattern matches the whitespace that follows
the end of a sentence: the end of a line, or a tab, or two spaces.
The double backslashes are needed to prevent Emacs from reading
the parentheses and vertical bars as part of the search pattern;
the parentheses are used to mark the group and the vertical bars
are used to indicated that the patterns to either side of them are
alternatives. The dollar sign is used to match the end of a line.
The tab character is written using `\t' and the two spaces are
written as themselves.
`[ \t\n]*'
Finally, the last part of the pattern indicates that the end of
the line or the whitespace following the period, question mark or
exclamation mark may, but need not, be followed by additional
whitespace.